Picture for Ziyun Zeng

Ziyun Zeng

Aurora: Unified Video Editing with a Tool-Using Agent

Add code
May 18, 2026
Viaarxiv icon

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Add code
May 18, 2026
Viaarxiv icon

Sparkle: Realizing Lively Instruction-Guided Video Background Replacement via Decoupled Guidance

Add code
May 07, 2026
Viaarxiv icon

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

Add code
Mar 05, 2026
Viaarxiv icon

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Figure 1 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 2 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 3 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Figure 4 for Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Viaarxiv icon

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Add code
May 26, 2025
Viaarxiv icon

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Add code
Apr 22, 2025
Figure 1 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 2 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 3 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Figure 4 for LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Viaarxiv icon

OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting

Add code
Mar 12, 2025
Figure 1 for OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Figure 2 for OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Figure 3 for OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Figure 4 for OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Viaarxiv icon

MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models

Add code
Oct 13, 2024
Figure 1 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 2 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 3 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Figure 4 for MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Viaarxiv icon

PromptFix: You Prompt and We Fix the Photo

Add code
May 27, 2024
Figure 1 for PromptFix: You Prompt and We Fix the Photo
Figure 2 for PromptFix: You Prompt and We Fix the Photo
Figure 3 for PromptFix: You Prompt and We Fix the Photo
Figure 4 for PromptFix: You Prompt and We Fix the Photo
Viaarxiv icon